word rank | frequency | word |
---|---|---|
1 | 40853644 | на |
2 | 28488150 | и |
3 | 21260989 | да |
4 | 17851639 | в |
5 | 17320129 | се |
6 | 16765637 | за |
7 | 16302792 | от |
8 | 15050221 | е |
9 | 10399664 | с |
word rank | frequency | word |
---|---|---|
10 | 7598338 | не |
20 | 2423892 | но |
30 | 1863781 | го |
40 | 1349269 | този |
50 | 1175432 | За |
60 | 1053791 | България |
70 | 939887 | Не |
80 | 808341 | защото |
90 | 706172 | между |
word rank | frequency | word |
---|---|---|
100 | 654660 | й |
200 | 293911 | друг |
300 | 193538 | започва |
400 | 155444 | дума |
500 | 131190 | очите |
600 | 115629 | производство |
700 | 99396 | изобщо |
800 | 90124 | Европейския |
900 | 81486 | храни |
word rank | frequency | word |
---|---|---|
1000 | 75062 | кола |
2000 | 41910 | й. |
3000 | 29059 | раждането |
4000 | 21728 | агенции |
5000 | 17304 | HP |
6000 | 14334 | спасение |
7000 | 12206 | издаването |
8000 | 10641 | резерва |
9000 | 9341 | Коста |
word rank | frequency | word |
---|---|---|
10000 | 8276 | Вечерта |
20000 | 3673 | преодоля |
30000 | 2190 | 5 юли |
40000 | 1494 | санирането |
50000 | 1093 | Клинична |
60000 | 840 | заявявал |
70000 | 668 | възхвалява |
80000 | 544 | Бижута |
90000 | 453 | ранкинг |
word rank | frequency | word |
---|---|---|
100000 | 383 | създай |
200000 | 119 | обострени |
300000 | 57 | Еренбург |
400000 | 34 | фикционалния |
500000 | 22 | Оия |
600000 | 16 | наливния |
700000 | 12 | домакинствани |
800000 | 9 | Stagecoach |
900000 | 8 | стереотипи” |
word rank | frequency | word |
---|---|---|
1000000 | 6 | Wilson’s |
2000000 | 2 | Евстюхина |
3000000 | 1 | MoffettNathanson |
4000000 | 1 | Стадиращата |
5000000 | 1 | микоидес |
6000000 | 1 | умрял,някак |
Words from different frequency regions are shown. For simplicity, the words with rank k10n, k=1,2,…,9; n=0,1,…, are chosen. In the case n=0 we see the 10 most frequent words, of course.
The tables provide words with fixed rank which might be useful for several comparisons. The average word should get longer with its rank.
For meaningful words at higher ranks we need at least medium a sized corpus.
For rank 1000, 2000, …, 9000:
set @k=3;
select w_id-100 as rank, freq, word from where w_id-100 in (1*pow(10,@k),2*pow(10,@k), 3*pow(10,@k),4*pow(10,@k),5*pow(10,@k), 6*pow(10,@k),7*pow(10,@k),8*pow(10,@k), 9*pow(10,@k)) order by rank;
3.2.1. The most frequent 50 words